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Abstract 

In the Bayesian community, an ongoing imperative is to develop efficient 
algorithms. An appealing approach is to form a hybrid algorithm by combining 
ideas from competing existing techniques. This paper addresses issues in design- 
ing hybrid methods by considering selected case studies: the delayed rejection 
algorithm, the pinball sampler, the Metropolis adjusted Langevin algorithm, 
and the population Monte Carlo algorithm. We observe that even if each com- 
ponent of a hybrid algorithm has individual strengths, they may not contribute 
equally or even positively when they are combined. Moreover, even if the sta- 
tistical efficiency is improved, from a practical perspective there are technical 
issues to be considered such as applicability and computational workload. In 
order to optimize performance of the algorithm in real time, these issues should 
be taken into account. 

Keywords : Delayed rejection method, importance sampling, Markov chain 
Monte Carlo, Metropolis-Hastings algorithm. Metropolis-adjusted Langevin al- 
gorithm, pinball sampler and population Monte Carlo 



1 



1 Introduction 



Marko v ch ain Monte Carlo (MCMC) methods, originally proposed by [Metropolis et al 
(|1953 '1 and'H asting£i()l970l) . are designed to generate Markov chains with a given 
stationary distribution. MCMC methods have the great advantage that they 
apply to a broader class of multivariate problems than methods based on in- 
dependent sampling. With these convenient features, MCMC methods have 
had a large impact on Bayesian statistics, in particular the computation of pre- 
viously intractable posterior distributions. Over the last decade, a great deal 
of literature has been published displaying the capability of MCMC in deal- 
ing with Bayesian solutions to realist i c pro blems in a wide variety of areas 
i Cappe and Robertl . I2OO0I : iBesag et"al] . ll995[ l. 



Another well known scheme in the Bayesian community is the importance 
sampling (IS) based algorithm. Instead of producing a Markov chain, this al- 
gorithm produces almost unbiased estimates using particles with associated im- 
portance weights at each iteration. These methods are extensively used to 
solve sequential Bayesian inference problem s in which the target evolves over 
time (Liu and Chen . 1998 : Douc et al. . 2001 ). For static m'oblems ICappe et al 



( 20041) extended IS to Population Monte Carlo (PMC) by allowing the im- 
portance function to adapt to the target in an iterative manner. This has 
been shown to be a computationally cheaper alternative to MCMC algorithms 
20061) . 



( Celeux et al. 



Over the years there has been a substantial amount of progress on the fun- 
damental ideas of designing efficient algorithms and the theoretical properties 
of these methods of simulation. Each algorithm has different strengths and can 
be categorized by meeting different criteria based on the statistical properties 
of the simulated Markov chain. It is therefore natural that the question arises 
whether a better scheme can be developed by combining the best aspects of ex- 
isting algorithms. In the following, we define the realisation of such an approach 
as a hybrid algorithm. 



A hybrid algorithm can be designed from different perspectives by a variety 
of choices of algorithms to combine and the way in which they are combined. 
The primary motivation is to propose an efficient algorithm that overcomes 
identified weakness in the individual algorithms. It is of course important to 
define efficiency in this context. Existin g criteria are primarily ba sed on the 
statistical properties of the Markov chain (jAndrieu and Robertl . 120011 ) . but there 
are also computing and applicability issues to be considered in implementing 
algorithms for realistic problems. In this paper we consider algorithms in a more 
practical manner, and focus on three criteria: statistical efficiency, applicability 
and implementation. These are now described in more detail. 

Statistical efficiency: This criterion is subject to the statistical properties of 
the generated Markov chain, such as the rate of convergence and the mix- 



2 



ing speed of chains. Although classical studies ensure that the MCMC 
method guarantees convergence to target, this simply may not be enough 
to guarantee convergence in real time. The work in recent years has 
thus focused on theoretical developments of MCMC samplers to maxi- 
mize efficiency. Established results for MC MC algorithms include g e omet- 
ric ergodicity for Metrop olis algorithms ( Mengersen and Tweedie . 19961 : 
Jarner and Hansen . 2000f). and expon ential ergodicity for the Langevin al- 



gorithm (jRoberts and Tweedid . 119961 ). When the chain is updated via an 



accept-reject setup, t he efficiency has been characterised by deriving opti- 
mal acceptance rates ( Roberts and Tweediel 1996b : Roberts and Gelman 
1997t [Roberts and Rosenthal l200ll )" 



There also has been considerable attention on the mixing of a simulated 
Markov chain. A chain with a slower rate of mixing is more strongly de- 
pendent on the initial value and may easily become stuck in certain part of 
the state space for multimodal problems. This becomes a severe problem 
in generating reliable samples from the tar get as the dimension increases . 
Mechanisms such as the repulsive proposal ( Mengersen and Robert liooi), 
geometric dete rminant proposal tMe n gersen and Robert . 2Q03) , and de- 



;al il 

layed rejection (jTiernev and MiraL Il999l ) are proposed to improve the rate 



of mixing and will be examined in more detail later. 

For the IS based algorithms the convergence of estimates to their expected 
values is guaranteed as the number of particles increases. In practice 
limited numbers of particles are used and the degeneracy of importance 
weights is an unavoidable drawback. Attemp ts devoted t o limit degen- 



eracy and reduce the variance can be found in lDouc et al 
others. 



among 



Applicability: In general a common feature of popular algorithms is that they 
are applicable i n a diverse range of realistic settings. For example, the 

jl984 ) and the Metro polis- Hastings 
1970l) enjoy univer- 



Gibbs sampler (iGeman and Geman 



algorithm (MHA) ([Metropolis et al 



Il953t [Hastings . 



sal popularity because of their simpl e setting and i n the latter case, flexible 



choice of the prop osal distribution ([Besag et al.l . Il995t iBesag and Greed . 
199.4 [Geveij . [l992 h. Well known hybrid a.pproaches based on the MHA are 



the delayed rejection algorithm (DRA) ( Tiernev and Miral . [l999) , the re- 
versible jump MCMC for va riable dimension models (RJMCMC) ([Green . 

[2001[ :[g rccn and Mira. '2001"), the Metropolis adjusted Langevin algorithm 

(MALA) ( Roberts and Twccdic. 1996.) . and the pinball sampler (PS) (Mengersen and Robert! . 
[20031 ). 



Implementation: This criterion addresses computational issues such as the 
level of difficulty in coding, memory storage required and the demand on 
computation in real time. 
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In the MCMC literature, theoretical studies of statistical efficiency (first 
criterion above) are typically based on the number of iterations, not actual 
CPU time. From the perspective of practical implementation, the efficient 
sampling performance of an algorithm in real time (seconds) involves both 
the statistical efficiency and CPU time consumed in implementing the 
algorithm. Thus optimal performance may involve a compromise between 
the increase in computing workload and the gain in statistical efficiency. 

In this paper, we focus on selected hybrid approaches: the DRA with difi'er- 
ent types of moves for the proposal, the MALA, the PS, and the PMC, and make 
comparisons between the hybrid algorithms and the component algorithms on 
which they are based with respect to three criteria defined above. These algo- 
rithms are described in Section 2. Section 3 summarizes results of simulation 
studies designed to facilitate the evaluation and Section 4 describes the appli- 
cation of the algorithms to a real problem of analyzing particle size data. The 
paper concludes with a discussion in Section 5. 



2 Hybrid Approaches 

A considerab le amount of attention has been devoted to the MHA which was 
develo ped bv [Metropolis et aL I (ll953l) and subsequently generalized bv lHasting^ 



( 1970 h . The advantages of the MHA are that it is straightforward to implement 



and is flexible with respect to the proposal density. However, it may take an 
unrealistically long time to explore high dimensional state s paces and tuning pa- 



rame ters involved with proposal density could be difficult (jRobert and Casella , 



20041) . In order to improve the performance of this algorithm, various forms 
of MHA samplers have been introduced. Before we study hybrid MCMC ap- 
proaches with the MHA, we give a brief summary of the MHA itself. 

The MHA associated with a stationary density tt and a Markov kernel qi 
produces a Markov chain {6''^*^}^o through the following transition. 



Metropolis-Hastings Algorithm (MHA) 

1 Generate the initial states from an initial distribution p, 9^^") ^ Pi')- 

2 For t = I, . . . ,r , 

2.1 Generate a candidate state (p from with some distribution qi 

^^gi(0(*-i),.) . 

2.2 Calculate an acceptance probability ai 
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2.3 Accept with probability ai{9^* ^\ (p). If tp is accepted, set 0*^*^ — Lp. 
Otherwise set Q^^^ = 6i(*-i). 



After a sufficient number of iterations, the initial state is forgotten and 
the algorithm generates samples from the target density tt. The Markov chain 
{ ^(*H?^n converges to th e target tt at a given rate ( Roberts and Tweedie . 1996bl : 
iMengersen and Tweedid . ll996i V A useful feature of the MHA is that it can be 
applied even when the target is only known up to a normalizing constant through 
the ratio Tr/gi. 



The random walk MHA in which q\(}p, 
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(6'(* , Lp) is perhaps 



the simplest version of the MHA, since the probability of acceptance reduces 
to Oi\ = min{l, 7r((^)/7r(6''^*^^^)}. However, the s ize of the proposal variance 



1997 



is critical to the performanc e of the algorithm (jRoberts and Gelman 
Roberts and Rosenthall l200l[) . If it is too large, it is possible that the chain 



may remain stuck at a particular value for many iterations, while if it is too 
small the chain will tend to make small jumps and move inefficiently through 
the support of the target distribution. 



2.1 Particle system in the MCMC context 

The MHA, like most MCMC algorithms to date, produces a single Markov chain 
simulation and monitoring focuses on its convergence of the chain to the tar- 
get. An alternative particle system based on importance sampling (IS) can be 
used to approximate the posterior expectation of estimates of interest. The key 
idea of the particle system is to represent the required distribution by a set of 
random samples (^p''-^\p2 \ ■ ■ ■ tP^^^^ with associated importance weights and 
to compute estimates based on these samples and weights. 



In the MCMC context, instead of using the IS the whole random vec- 
tor ^6'f\02*^''' j^TV^) is resampled at each iteration according to a Marko- 
vian updating process. Once particles have eliminated their dependence on 
initial values, they can be considered as iid samples from the target at any 
given time t\ in contra s t, a l ong run is required in regular MCMC sampling. 



Mengersen and Robert! ([2003!) showed that an MCMC algorithm generates the 



production of samples of size N from tt and a product distribution tt*^ is not 
fundamentally different from the production of a single output from tt. 



The simplest version of a particle system in an MCMC context is a parallel 
run of independent MCMC algorithms. In this type of implementation the re- 
sult and chains are considered individually rather than as a whole vector. The 
paralle l system can be found in th e literature on coupling metho ds for Markov 
chains (jBrever and Robertsl . [2000l) . [m engersen and Robert! (|2003l ) proposed up- 
dating schemes, the repulsive proposal and the geometric determinant proposal, 
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that allow interaction between particles. These schemes are in essence hybrid 
algorithms, since they combine existing algorithms, and will be examined in 
detail later. 



2.2 Metropolis-adjusted Langevin algorithm (MALA) 

The random walk MHA has an advantage that it moves independently of the 
shape of tt and is easy to implement. However, an algorithm that is more 
specifically tailored to tt may converge more rapidly. This leads to the MALA 
that applies a Langevin diffusion approximation using the information about 
the target density (in th e form of the derivative of log tt) to the MHA structure 
( Besag and Green . 19931 ). Assuming that the target distribution tt is everywhere 
non-zero and differentiable so that the derivative of log tt is well defined, Lp is 
generated from 



qi{e^'-^\ [e^'-^^ + iwiog^(0(*-i)), 



(2) 



As with the traditional Euler 



where /i is a scaling parameter. The proposal is accepted with probability 

method, scaling the step size h is important. If h is t oo large or too small 
the c h ain may never c onverge to the target in real time ([Roberts and Stramer 
2002[) . lAtchadel (|2Q06f ) showed that h can be tuned adaptively. 



The idea of shapi ng the can d idate density based on the target was intro- 
duced as long a go as iDoll et al.l (119781) and in the prob abilistic literature h as 
been studied bv lRoberts and TweediJ(ll996llbll. Istramcr and Tweedid (|l999l lbh 
and others. In discrete time space, Roberts and Tweedie ( 19961 ) proved that the 
MALA is exponentially ergodic when the tails of the target density are heavier 
than Gaussian. This is an advantage of MALA over the MHA that is geomet- 
rically ergodic. With a fast convergence rate the algorithm may be efficient for 
sampling within a single mode, but it can still fail to explore more than a single 
mode of the distribution. This is a major weakness of the MALA since realistic 
problems are often multimodal. Moreover, in practice VlogTr can be expensive 
to compute and sometimes the first partial derivative form is not straightfor- 
ward to evaluate. 



As an attempt to cross low probability regions between modes, ISkare et al 



used a smoothed Langevin proposal based on a function approximation 
and lCeleux et al.l (|2000( ) demonstrated a tempering scheme using Langevin al- 
gorithms. 



2.3 Delayed rejection algorithm (DRA) 

In the regular MHA, whenever a proposed candidate is rejected, the chain re- 
tains the same position so that 6'^*-' = O'-^^^K This increases the autocorrela- 
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tion alonR the rea l ized p ath and thus increases the variance of the estimates. 
Tiernev and Mira (|l99 9^ proposed the DRA to amehorate this effect. The cen- 
tral idea behind this algorithm is that if the candidate is rejected, a new can- 
didate is proposed with an acceptance probability tailored to preserve the sta- 
tionary distribution. Thus at the i"^ iteration, if a candidate is rejected with 
probability q;i(0'^*~^\ (/?), a proposal -d is constructed from a possibly different 
proposal density q2{Q'^^~^\ 'f, •) and •& is accepted with probability 



Q2(6'^*"^',(/?,i?) = min<{ 1, 



7r(6'(*-i))qi(6'(*-i), ^)g2(f - ai(6K*-i) , (^)] 

The algori thm can be extended to allow proposals to be made after multiple 
rejections (|Mirall200il) . 



This scheme does not guarantee an automatic increase in the acceptance 
probability. However, once a better candidate is accepted, it induces an in- 
crease in the rate of acceptance and a reduction of the autocorrelation in the 
constructed Markov chain. 



The drawback of the DRA is that the amount of computation and program 
workload increases geometrically with the number of delayed rejection attempts. 
The number of attempts may depend on the typ e of problem, model, and core 
questions to be answered. iGreen and Mira ( 2001 ) concluded that the workload 
required for three or more attempts was not worthwhile based on their limited 
experiments. 

Since the proposal distributions (gi and qi) are not constrained to be the 
same, various types of proposal distri butions can be constructed using infor- 
mation from rejected proposals. Green and Miral ([2001,) developed a hybrid 
algorithm by applying the DRA to the reversible j ump algorithm for trans- 
dimensional problems, and iMengersen and Robert! hm± suggested the pro- 
posal scheme in a two dimensio nal space. We consider here th e geometric deter- 
ministic proposal suggested bv IMengersen a.nd Robert ( 2003 ) and the Langevin 
proposal originally suggested bv iMira et al.l (|2004l ). 



Geometric deterministic proposal ([Mengersen and Robertl . 120031 ) : The 

idea is to push particles away using symmetry with respect to a line defined 
by the closest particle and rejected particle. Suppose that the proposal, 
H>i ~ <li{(^i, ) is rejected. The second proposal z9i is generated by taking 
the reflection of 9i with respect to the line connecting 6* and (pi where 6* 
is the closest particle to tpi among 6i, - ■ ■ 9i-i, ^i+i, • • ■ , 0^. This is the 
so-called pinball effect. 



Langevin proposal : If the current proposal tp is rejected, the second proposal 
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■d is generated according to 



i\ •) = (^(^ + i W log 7r(^), 
2.4 Pinball Sampler (PS) 



Mengersen and Robert! tOQ^ introduced a hybrid algorithm, the PS, which is 



a combination of the MHA, delayed rejection mechanism, particle system, and 
repulsive proposal. It is an updating system for a particle system that is based 
on a standard random walk with correction to avoid the immediate vicinity of 
other particles. 

Unlike traditional particle systems, neither importance sampling schemes 
nor weights are used. At each iteration, the whole vector \ 0'^\ . . . , ^'tv'') is 
updated. Thus the algorithm produces iid samples from the target, which is an 
advantage over MCMC methods. Moreover, at each vector update the repulsive 
mechanism discourages particles from being too close and hence avoids possible 
degeneracy problems. 

We now study the properties of the repulsive proposal density in order to 
understand the mechanics of the PS. 



Repulsive proposal (RP) The RP suggested bv lMengersen and Robert! (|2nn3h 

uses the following pseudo distribution, 

7r«(0O cx ^(0,) []e-«/-(«^)ll«'-«^H' (3) 
where ^ is a tempering factor. 

The pseudo distribution, ir^, is derived from the distribution of interest 
TT by multiplying exponential terms that create holes around the other 
particles 9j, {i ^ j)- These thus induce a repulsive effect around the other 
particles. The factor Tr{9j) in the exponential moderates the repulsive ef- 
fect in zones of high probability and enhances it in zones of low probability. 
If the tempering factor, ^, is large, tt^ is dominated by the repulsive term 
and becomes very different from the target distribution tt. If ^ is small, 
there is a negligible repulsive effect on tt^, so tt^ « tt. 

The MHA is easily adapted to include the repulsive proposal through two 
acceptance steps, firstly with tt^ and secondly with the true posterior tt 
to ensure convergence to the target distribution. The repulsive proposal 
algorithm is as follows. 



Metropolis Hastings algorithm with the repulsive proposal 
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1 Generate initial particles from an initial distribution p, ~ p{-). 

2 Fori = 1,...,T, 

For i = l,...,N, 

2.1 Generate ipi ~ qi{0f~^\-). 

2.2 Propose Step 

Determine the probability using tt^, 

•-•'=-°|\«(r"),.(r-,..)r 

Generate r ^ ?7[0, 1]. If r < , go to Correction Step. Otherwise 

set ef^ = et'^. 

2.3 Correction Step 

Implement a final Metropolis acceptance probability calibrated 
for the target distribution tt, 

Accept ipi, with probability ai. If tpi is accepted, set = 
otherwise set 0^^^ = 9^^~^\ 



The tempering factor ^ can be calibrated during the simulation against 

the number of particles N and the acceptance rate. The value should be 
tuned such that proposals are easily rejected if they are generated from 
low probability regions or are too close to existing particles. 

Borrowing the idea from the DRA, two or more attempts at updating may 
be pursued using the information from rejected proposals, and the delayed re- 
jection step can easily be included in this algorithm. 

The two-stage PS algorithm for generating from a target distribution tt as 
follows. 



Pinball Sampler (PS) 

1 Generate initial particles from an initial distribution p, ~ p(-). 

2 For f = l,...,r, 

For i = !,■■■ ,N, 

2.1 First stage 
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2.1.1 Generate (^^ - 51(6*^ ^\-). 

2.1.2 Generate ri ^ [/[0, 1]. If ri < al{0^*~^\ ipi) , accept ipi as in 

the Correction Step. Either ri > a*(6'-*~^\ (yCi) or ipi is rejected 
with TT, go to the Second stage. 

2.2 Second stage 

2.2.1 Draw a candidate t9i <l2{Qf ^K'Pi-,-)- Here the (72 is the 
geometric deterministic proposal. 

2.2.2 Generate - ?7[0,1]. If rs < 0^(6*^ (^,, accept z?, as 
in the Correction Step. 



Note that other proposals can be considered for (72 for different problems. 
In particular, some stochasticity could augment the deterministic proposal de- 
scribed above. 



2.5 Population Monte Carlo algorithm (PMC) 

The PMC algorithm bv lCappe et al.l (|2004l ) is an iterated IS scheme. The major 
difference between the PMC algorithm and earlier work on particle systems is 
that the PMC algorithm uses a resample step to draw samples from a target 
that does not evolve with time. 



Population Monte Carlo (PMC) algorithm 

1 Initialization, 

Generate initial particles from an initial distribution 9^^^ ^ 

2 Foi t=l,...,T, 

2.1 Construct the importance function g^*'^ and sample (pi\--- , V'iv^ 

2.2 Compute the importance weight 




2.3 Normalize the importance weights to sum to 1. 

2.4 Resample with replacement N particles {0^^^;i = 1,...,N} from 

the set {^f^',i — 1, • . . , N} according to the normalized importance 
weights. 
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The advantage of the PMC over the MCMC algorithms is that it produces 
(approximately) unbia sed samples at each iteration and the adaptive perspective 
can be achieved easily. iDouT^aD llool) proposed a Z3-kernel PMC that auto- 
matically tunes the scaling parameters and induces a reduction in the asymptotic 
variance via a mixture of importance functions. 



Resampled particles according to the importance weights are relatively more 
informative than those that are not resampled and this available information 
can be used to construct the g at the next iteration. A common choice for 
g is a set of individual normal distributions for Oi, . . . ,9n so that the impor- 
tance function of particle-i is defined as a normal distribution centered at 9i 



(jCappe et all . |2004 iDouc et all . 120071 ). 



In this paper we consider a different approach such that g is based on the 
population of 9i, . . . ,6n and its evaluation of particle-i involves all resampled 
particles, not just 9i. The nonparametric kernel estimate of a density of 9 is 
given by fitting a mixture of TV components in equal proportions 1/N, 



1 



N 



i=i 



where /C is some kernel. When Kis a, standard Gaussian function with mean 
zero and variance 1, the equation simplifies to a mixture of Gaussian functions 
with mean 9 and variance 



1 ^ 

9{e') = -Y.^[9' ~9,,x^) . 



(4) 



The main interest in the kernel estimate of density literature is the choice 
of optimal values for 9' so that the density is neither undersmoothed nor over- 
smoothed (jScottl . I1992I ). Since the choice of g with tails that are fatter than tt 



guarantees the existence of the variance of the estima te from a the oretical point 
of view, we multiply the optimal value suggested bv IScott ( 1992f ) by fc > 1, so 
that 



where a'i denotes the variance of ^i, . . . , 9^^ and D is the dimension of tt. 



As with other IS based algorithms, the downside of the PMC algorithm is 
that degeneracy may occur due to instability of the importance weight for the 
multimodal target as the dimension increases. The degeneracy is observed as 
the importance weight is concentrated on very few particles or a single particle. 
Theoretically this phenomena can be avo ided when the number of particles in- 
creases exponentially with the dimension (|Li et al.l . 120051 ) but this usually causes 
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storage problems. 



As an attempt to improve this degeneracy phenomena we now adapt the 
repulsive effect described in Section 2.4.1 to the PMC algorithm. 

2.5.1 PMC with the repulsive effect 

The idea of this hybrid approach is to generate particles from the importance 
function with the repulsive effect in the standard PMC context. Due to the 
repulsive effect proposed particles are not too close to each other and are still in 
a relatively informative region via a kernel density estimate based on resampled 
particles from the last iteration. Let (fi,..., ipjf be random variables almost 
from g as in The pseudo distribution with the repulsive effect is 

|^(l-^) + ^ne-«/f(^^)"^--^^ll'j . (5) 

This form consists of one more parameter than the tt^ used in the MCMC 
algorithm and the size of hole is adjusted by ^ and v; the ^ indicates the width 
of holes, and the i^, (0 < < 1) indicates the depth of holes. If both and ^ 
are very small, negligible holes are created around particles, and proposals are 
almost from the g. For large v and ^, particles are pushed away with greater 
repulsive effect due to bigger holes. In particular if ^ is very large with respect to 
the number of holes N, the holes overlap with each other and g is oversmoothed. 
These hyperparameters need to be tuned with respect to the total number of 
holes and a scaling parameter k. 

In order to apply g to the IS successfully, its normalizing constant should 
be known. Since it is not solvable analytically, we approximate the normalizing 

constant, C, 



= c 



-E 



M 



M ^^g{(p„i) - g{ip„i) 



= C (6) 



where p{(pi, . . . , (Pm) 9 ~ 9 Sind M > N. For reasonable approximation of C, 
adjusting M with respect to the width and the depth of holes is necessary. 



PMC Vifith the repulsive effect 



12 



1 Initializaiion, 

Generate initial particles from an initial distribution p, 9^^ ~ p(-). 

2 For t = l,...,r, 

2.1 Construct the g^*^ based on 0^* ^\ ■ ■ ■ , 0% and sample i^^*^ , • • • , i^jv^ 
from £f(*\ 

2.2 Construct the 5^*^ by creating holes at \ • • ■ , (^^\ Generate (p^^j^i, ■ ■ ■ , 
from 5^*) that satisfy g^^\'p) > g^*\(p). Estimate the C" based on 

2.3 Generate f/?^*', • • • , ip^^ from 5*^*'. 

2.4 Compute the importance weight 



J'^ oc C- 



git) 



2.5 Normalize the importance weights to sum to 1. 

2.6 Resample with replacement N particles {6*-*^ i = 1,...,A^} from 

the set {iPi*^;i = 1, • • • , N} according to the normalized importance 
weights. 



In this approach samples (pi, . . . , ipM are generated to create holes and es- 
timate C . Through this process proposals ipi, . . . , <^jv can be generated from 
g. As will be shown later, when ^ is small, simply M = N gives a reasonable 
approximation and no extra particles are required. 

Both PMC and its hybrid approach produce estimates of the posterior ex- 
pected values using IS. To assess the characteristics of the algorithm over iter- 
ations, the simulation is summarized by a series of estimates, 

(N N \ 

j=l i=l / 



3 Illustration of hybrid algorithms 

In this section we examine the hybrid methods described in Section 2 and study 
how individual components of the algorithms influence the properties of the over- 
all algorithm. Algorithms are programed using version 7.0.4 MATLAB software 
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and run by the Lyra that is an SGI Ahix XE Cluster containing 112 x 64 bit 
Intel Xeon cores. The investigation is first carried via simulation of the hybrid 
MCMC algorithms using a toy example in Section 3.1 and 3.2, and the sensitiv- 
ity of a tempering factor of the RP is given in Section 3.3. Section 3.4 presents 
a simulation study of the PMC and its hybrid approach. 

3.1 Simulation study 1 

We commence our investigation by choosing a target distribution that comprises 
two well separated modes, described by a mixture of two dimensional normal 
distributions 

7r=^Ar([0,0r,72) + ^A^([5,5F,/2). 

The expectations of the two components are E„((?i) = E7r(6'2) = 2.5. A normal 
proposal distribution qi{0, •) = M{9, s/2), s > is used for the MHA, the RP, 
and as the first-stage proposal for the DRA and the PS. 

We consider two simulation studies. The first study focuses on the perfor- 
mance of the algorithms in terms of statistical efficiency, and relative cost of 
computation. The second study focuses on the mobility of the chains in a spe- 
cial setup where initial values are generated from a certain part of state space 
and demonstrates the ability of the algorithm to detect the two modes of the 
target distribution. 

3.1.1 Performance study 

In this paper the comparison of the performance of algorithms is based on 100 
replicated simulations of each algorithm. Each simulation result is obtained 
after running the algorithm for 1200 seconds using 10 particles. This allows a 
comparison of the algorithms in a more realistic way by fixing the rimning time 
and observing the consistency of estimates throughout the replications. 

For each simulation, the performance is defined as the accuracy of estimat- 
ing H, K^[H{9i,02)] = and the variance (cr|j), the efficiency of moves 
indicated by the rate of acceptance (A), the computational demand indicated 
by the total number of iterations (T), the mixing speed of the chain estimated 
by the integrated autocorrelation time {tt^{H)), and the loss in efficiency due 
to the correlation along the chain by the effective sample size {ESSr) where 
ESSh = {T -Tn)/T^{H) and To is the length of the burn-in. The term indi- 
cates the number of correlated samples with the same variance-reducing power 
as one independent sample, and is a measure of the autocorrelation in the chain. 
The ESSh can be interpreted as the number of equivalent iid samples from the 
target distribution for 1200 seconds of running and provides a practical measure 
for the comparison of algorithms in terms of both statistical and computational 
efficiency. Except for T and A, all measurements are estimated after ignoring 
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True value 
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(456.0547) 


(< lo-'') 


(0,0032) 


(0.0013) 


(0.9197) 


(10.8990) 


MHA 


2.4259 X 10^ 


0.30 


2.5002 


0.7250 


296.5254 


2181.7 




(1.2 X 10*) 


(< 10-4) 


(0.0007) 


(0.0005) 


(0.5736) 


(43.1207) 


MHA"^ 


1.2172 X 10^ 


0.29 


2.4904 


0.7213 


266.5477 


371.2440 


(C = 10-^) 


(581.89) 


(< 10-4) 


(0.0036) 


(0.0028) 


(2.4224) 


(3.8696) 


DRA 


1.0012 X lO'' 


0.49 


2.4978 


0.7187 


196.8483 


5086.9 




(24.8503) 


(< 10-4) 


(0.0009) 


(0.0006) 


(0.5143) 


(13.1896) 


DRA^P 


1.9914 X 10^ 


0.34 


2.5010 


0.7223 


253.4174 


786.2 


(/i=4) 


(1031.5) 


(< 10-4) 


(0.0020) 


(0.0018) 


(1.3601) 


(6.0554) 




6.4729 X 10^ 


0.61 


2.5025 


0.7235 


239.1775 


2708.4 




(3091.8) 


(< 10-4) 


(0.0010) 


(0.0009) 


(0.8232) 


(17.3921) 


MALA 


2.0415 X 10^ 


0.29 


2.4978 


0.7265 


828.4097 


505.0 




(241.9826) 


(< 10-4) 


(0.0093) 


(0.0068) 


(56.5102) 


(60.2510) 



Table 1: Comparing performance in the two dimensional example with the 
proposal variance s — 4. The values in the bracket are the MSE of measures. 



the first 500 iterations (Tq = 500). 

Based on the 100 replicates, these measures are summarized by taking the 
respective averages {H, aj^, A, T, -f^(i?), and ESSh) and mean square error 
(MSE). The MSE measures the consistency of performances over 100 replicates 
and the stability of the algorithm. For instance a small MSE can be interpreted 
to indicate that the algorithm will produce a reliable output with a given ex- 
pected statistical efficiency. The performance of the single and parallel chain 
implementation of the MHA, the MHA with the RP, the DRA using three moves 
for the second-step proposal and the MALA using a traditional proposal and 
smoothed Langevin proposal are summarized in Table [T] and Table [5] for two 
different sizes of scaling parameters, s = 4, 2 and h — 4,2, respectively. All 
chains converged to the target reasonably well based on informal diagnostics. 
The performance of each algorithm is discussed below. 

MHA It is noticeable that due to the parallel computing the MHA takes the 
shortest CPU time per iteration and produces the largest sample size, 20 
times more proposals than the single chain, MHA''. The improvement 
in bias of samples from the parallel chains is not observable compared to 
a single chain. To optimize the p erformance the proposal vari ance s is 



usually tuned such that A « 0.3 (jRoberts and Rosenthall . 120011 ) and for 



this example s = 4 is the optimal value. 



MALA When h is not sufficiently large it is seen that samples are easily 
trapped in a certain part of the state space. It is known that often the 
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(346.7434) 


(< 10-4) 


(0.0053) 


(0.0014) 


(1.7493) 


(3.0647) 


MHA 


2.481 X 10^ 


0.43 


2.4992 


0.7253 


872.3864 


2845.7 




(2.52 X 10^) 


(< 10-4) 


(0.0012) 


(0.0009) 


(2.5346) 


(29.9846) 


MHA""" 


1.2294 X 10^ 


0.42 


2.4952 


0.7230 


786.3059 


157.7317 


(C = 10"^) 


(1.37 X 10^) 


(0.0016) 


(0.0062) 


(0.0044) 


(9.7029) 


(2.3711) 


DRA 


1.0012 X lO'' 


0.63 


2.4983 


0.7148 


700.9815 


1429.6 




(41.2523) 


(< 10-4) 


(0.0015) 


(0.0013) 


(2.6121) 


(5.2876) 


DRA^P 


2.2856 X 10^ 


0.61 


2.5021 


0.7235 


553.5437 


413.6313 




(1.77 X 10^) 


(< 10-4) 


(0.0031) 


(0.0024) 


(3.5946) 


(4.0817) 


jj j^j^Pinball 


6.8763 X 10^ 


0.68 


2.4990 


0.7218 


674.7575 


1020.0 




(5.21 X 10^) 


(< 10-4) 


(0.0021) 


(0.0015) 


(3.0249) 


(8.6733) 


MALA 


41.9834 X 10^ 


0.67 


2.5069 


0.6916 


38.0650 


5839.6 


{h = 2) 


(1.66 X 10^) 


(< 10-4) 


(0.0166) 


(0.0122) 


(1.3820) 


(198.9234) 



Table 2: Comparing performance in the two dimensional example with the 
proposal variance s — 2. The values in the bracket are the MSE of measures. 



MALA performs poorly for multimodal problems, as the chain is pulle d 



back toward the nearest mode and may become stuck ( Skare et al. . 2000l ) 



This is observed by a larger MSE of measures compared with other MCMC 
algorithms in general. In particular when s = 2 a very small rj induces a 
huge ESS and corresponding MSE of H. 



MHA with RP (MHA^^) It is seen that the repulsive effect induces a fast 
mixing chain. The repulsive proposal reduces the autocorrelation along 
the chain by pushing particles apart around a neighborhood. Its effect is 
more obvious when the proposal variance is relatively small. 

The repulsive proposal imposes an additional tuning parameter ^ on the 
MHA^^ compared to the MHA. The algorithm is illustrated with ^ = 
10-4, and approximately 99.5% of proposals that are accepted with tt^ 
are accepted with tt. The choice of ^ is critical in implementing the RP 
and we will discusss this matter later. 

The drawback of the MHA^^ is that it is expensive to compute. For D 
dimensional problems with N particles and an arbitrary N x D matrix 
9^^^ the repulsive part of a pseudo distribution adds 0{ND) operations. 

This algorithm also can be unstable and as for the MALA, the MSE of 
measures tends to be large in some circumstances. As illustration, given 
with ^ carefully tuned, we observed six exceptional measures out of 100 
replicates using s = 2,4; see Table [3TTTT] A very low rate of acceptance, 
a high Ttt, and an extremely small ESS indicate that the chain is stuck in 
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118283 


0.05 


3.2699 


0.2863 


3641 


32.32 




114805 


0.06 


3.4432 


0.6235 


5907 


19.35 


4 


122372 


0.0009 


1.4894 


0.4307 


10887 


11.19 




122107 


0.0461 


3.1459 


0.5232 


1817 


66.94 




124531 


0.1922 


2.9552 


1.1092 


602 


206.17 




124411 


0.1878 


1.9514 


0.5510 


423 


292.69 



Table 3: Exceptional estimates of MHA^^. 



a certain region and does not adequately explore the state space. How- 
ever these instances occurred relatively rarely, and the most of times the 
algorithm performs reasonably well. 

DRA By making the second attempt to move instead of remaining in the cur- 
rent state, algorithms with a delayed rejection mechanism produce less 
correlated samples and have a higher rate of acceptance. The demand in 
computation is also increased and differs with different types of moves. We 
examined three types of moves for the second-step proposal, the normal 
random walk as for the first proposal {DRA) , the geometric deterministic 
proposal (I?i?y4^™''°" ) and the Langevin proposal (DRA^^). 

With the geometric deterministic proposals, over 60% of proposed moves 
were accepted regardless of the value of the proposal variance. The geo- 
metric deterministic proposal pushes away from the closest particle; how- 
ever with a moderate repulsive probability proposed particles may remain 
in the neighborhood of existing particles, in which case it is highly likely 
to be accepted. 

Generally the Langevin proposal seems a suitable choice for the second- 
stage proposal in terms of the improvement in the mixing speed of chains 
when the proposal variance is not relatively large. However it requires a 
greater amount of computation than other types of proposal, and despite 
a diminished r^r the ESS is relatively small. In this case the loss in the 
computational efficiency overwhelms the gain in the statistical efficiency. 
Given the nature of the Langevin diffusion tuning of h is essential. 

The normal random walk for a second stage proposal is faster to com- 
pute and results in improved mixing with a sufhciently flexible choice 
of the proposal variance. As seen in Table [l] when particles can move 
around the state space efficiently, neither the deterministic proposal nor 
the Langevin diffusion approximation will induce a substantive reduction 
in the correlation of the samples. 
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3.1.2 Ability to detect modes 

We now consider a burn-in period in which chains are still dependent on initial 
states. A strong dependence of chains on the starting value is a well known 
problem with the MHA. We examine the selected hybrid approaches to study 
how quickly chains move from initial values and find other local modes. 

We run algorithms in the same manner with initial particles from one partic- 
ular mode A/'([0, 0], 12), s = 2 for the random walk, and h — 2 ioi the Langevin 
proposal. Through 400 repeated simulations, the ability to detect the other 
mode A/'([5, 5], 12) is estimated. The allocation of particles is determined by the 
distance from the center of mode. 

Figure [T] shows a histogram of the number of iterations before the second 
mode is detected for the first time during 400 replicates of a parallel running 
of the MHA, and the DRA with the Langevin proposal. Generally particles 
escaped from the first mode relatively quickly but there is a noticeable number 
of chains for which this was not the case. The chains with a slow mixing rate 
{MHA) tended to have a longer tail than the faster chain (DRA^^). 

Table U] shows the total number of trials in which the second mode was 
detected within the first 50 iterations during 400 repeats. Approximately 86.5% 
of the MALA simulations failed to detect both modes. Generally the DRA 
had a better ability to detect modes than the MHA. In particular 72.5% of the 
DRA'"^ trials found the second mode within 50 iterations, compared to only 
50.8% for the MHA. 




50 100 150 200 250 300 50 100 150 200 250 300 



Figure 1: Histogram of the number of iterations before the second mode is 
discovered via running a parallel chain of (a) the MHA, and (b) the DRA with 
the Langevin proposal over 400 replicates. 



18 



MHA 


203 


MHA"P 
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DRA 
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250 


DRA^P 


290 


MALA 


54 



Table 4: The total number of cases in which the second mode is detected within 
the first 50 iterations during 400 replicates. 
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PS 


4 


95226 


0.56 


2.4904 


0.7213 


266.5477 


371.2440 


(e = io-5) 




(891.984) 


(0.0067) 


(0.0268) 


(0.0104) 


(6.8292) 


(8.2292) 




2 


93550 


0.64 


2.4971 


0.7148 


679.0638 


141.1047 






(972.6976) 


(0.0061) 


(0.0240) 


(0.0074) 


(11.1321) 


(2.6741) 



Table 5: Performance of the PS. 



3.2 Simulation study : PS 

The PS is interesting to study as a hybrid algorithm because it combines fea- 
tures of four schemes, the MH, DRA, RP, and particle system. We study the 
relative contribution of these individual algorithms and their combined charac- 
ter through a simulation study analogous to that described in Sections 3.1.1 and 
3.1.2. 

The performance study of the PS is carried out in the same manner to Sec- 
tion 3.1.1, and a summary of results is given in Table [5] Comparing the results 
of the PS to component algorithms in Tables [T] and [21 the PS produces less 
correlated samples than the MHA and has a high rate of acceptance due to the 
DRA and pinball effect. A previous simulation study in Section 3.1 has shown 
that the repulsive proposal and the geometric deterministic proposal are better 
choices than the small random walk proposal; however when the size of random 
walk is sufficiently large, the normal random walk proposal is better. This was 
apparent in this simulation study. In particular when s = 2, was smaller and 
A was greater than for the individual algorithms, a single and parallel chain of 
the MHA, MHA with RP, and the DRA. This was supported by the results of 
the mode detecting test in which 245 out of the 400 repeats found the second 
mode within 50 iterations. It seems that overall the statistical efficiency of the 
PS is most affected by the DRA. 



As seen for the MHA with the RP, some instability of performance and 
exceptional values of ESSh are seen with the PS. This phenomenon is illustrated 
in Table m 

The repulsive proposal adds a tuning parameter ^ and the geometric deter- 
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1.6395 


0.4539 
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110.98 



Table 6: Exceptional estimates of the PS. 



ministic form of the proposal considered here gives a restriction on the size of 
dimension. These issues thus impacted on the implementation of the PS. The 
major drawback of the PS was a large workload in terms of computation due to 
the repulsive effect, and high cost moves for the proposal at the second stage. 
As a result the PS produced the smallest number of samples among the MCMC 
simulations for the same computational cost. Since the tempering factor ^ con- 
trols the repulsive effect, it is natural to study the sensitivity of ^ and how it 
affects the efficiency of the algorithm. 

3.3 Tempering factor ^ in the repulsive proposal 

In this section we demonstrate the sensitivity of the tempering factor ^ through 
a set of examples. 

In the MHA, the acceptance probability is 



where p{Oi,ipi 



a{9i,ipi) ^ mm {l,p{9i,ipi)} 
Tr{ipi)qi{ipi,9i) 



Tr{9i)qi{9i,ipi) ' 
The acceptance probability using tt^ is 



where p*{9i,ipi) 



a*{9i,ipi) = m:m{l,p*{9i,ipi)} 

'!r^i(pi)qiiipi,9i) 
Tr'i{9,)qi{9,,^,)- 



The sensitivity of f is tested by comparing p and p* for different values for 
^. Using the same target as in Section 3.1 we consider ten existing particles in 
which are samples from tt and random walk proposals depending on selected 
three particles. This is depicted in Figure [U existing particles and proposals 
are indicated by crosses and dots respectively, and three moves (a)-(c) are rep- 
resented by arrows. 

The result is summarized in Table [71 and the acceptance probability is 
p = Tr{(pi)/Tr{9i) and p* — TT^{ipi)/n^{9i) due to the use of random walk pro- 
posals. It is seen that when f is large the tt^ is dominated by the repulsive term 
as is the acceptance ratio. In particular for the move (c) the proposal in a high 
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0.0032 
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0.0039 


0.0603 
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0.0647 


0.0129 
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1.1831 
0.0504 


1.1831 
0.0504 


0.4837 
0.0178 


1.4775 X 10-"* 
1.4775 X 10-6 




0426 


0426 


0368 


0100 



Table 7: Comparison of p 
sive impact. 



and p* = — for different size of repul- 



probability region is too close to existing particles and p* decreases rapidly with 
an increasing ^. In contrast, the proposal in a slightly lower probability region 
by the move (b) is well separated from existing particles and accepted with a 
probability of 1 with ^ = 10-^. 




Figure 2: Ten iid samples (cross) from tt as in Section 3.1 and three proposals 
by the normal random walk (dot) from selected particles which are marked by 
an arrow. 

In order to observe its effect on the MCMC simulations, we run the MHA 
with the RP with different values for ^ and estimate the rate of acceptance at 
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10 10"' 10 10* 10""^ 10 

Figure 3: The rate of acceptance at the Correction stage 

the Correction Step given accepted particles at the Proposal Step, 

number of accepted particles at Correction Step 
number of accepted particles at Proposal Step 

Figure [3] shows how 77 varies as the repulsive effect increases based on sim- 
ulations of 500 iterations. When ^ < 10^^ most of the proposals that are 
accepted at the Proposal Step are accepted at the Correction Step. As ^ exceeds 
10~^, fewer proposals are accepted and the efficiency of the algorithm decreases 
rapidly. 

For efficient sampling, the tempering factor ^ should be tuned, so the pro- 
posals are accepted using both tt^ and tt at a reasonable rate. In other words, 
TT^ should be relatively close to tt with a value for ^ large enough to give a 
repulsive effect on the particles. 

The tempering factor can be tuned over iterations without changing the 
stationarity of the chains. For instance a larger value can be used at the begin- 
ning of the simulation in a way that induces particles to forget the initial states 
quickly. 

3.4 Simulation study : PMC algorithm 
3.4.1 Performance study 

The performance comparison of the PMC algorithm and its hybrid approach 
is carried out in the same manner to that described in Section 3.1, using 100 
replicates. The algorithm is run for 500 seconds using 50 particles and the first 
100 iterations are ignored. The hyperparameters ^ = 10"^ and v = 0.3 are used 
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for the repulsive effect. 

The simulation results based on the 100 repeats are summarized in Table |51 
Here is the integrated autocorrelation time along a series of estimates using 
the IS, and the ESS estimates differences between the trial distribution based 
on particles and the target distribution. The rib denotes the total number of 
biased estimates during T iterations. The bias is determined by the distance be- 
tween the estimate of both components using the IS and their expected values, 
[E^(0i),E^(02)]- If the distance is greater than 3.51, the empirical distribution 
is asserted to be strongly biased towards one mode over the other mode. The 
ratio Hh/T indicates the rate of poor exploration and relates to t^. 

From Table |8] it is seen that the estimate H of both algorithms converge to 
the expected values reasonably well. It is seen that and nt/T of PMC^ are 
reduced, and in particular when k = 2.5, is reduced significantly. This is the 
most noticeable improvement of fast exploration of particles in the multimodal 
space among the hybrid algorithms considered here. Moreover there is no sign 
of the instability of the performance that was a crucial concern shown in the 
MCMC algorithms. 

When the scaling parameter k is large so that the g overlap over the prob- 
ability region, particles move around the space quickly without the repulsive 
effect. However interestingly the statistical efficiency of the PMC using a wider 
tailed g is not substantively improved compared to the hybrid approach using a 
shorter tailed g. This means that the statistical performance of this hybrid algo- 
rithm is less sensitive to the scaling parameter. Recalling that the performance 
of the hybrid versions of the MCMC algorithms are sensitive to the proposal 
variance, this result is useful information in practice. 

The mode detecting test was carried out in the same manner as in Section 
3.1.2 with resampled particles {0i, ...,0n) for A: = 2.5. For the PMC, 120 out 
of 400 repeats detected the mode within the first 5 iterations, with 335 for the 
PMC with the repulsive effect (Figure 2]). This result supports the improvement 
in the exploration of particles by the use of the repulsive effect. 

As observed for the MCMC algorithms, adapting the repulsive proposal 
{MHA^^ and PS) induces an increase in computational workload per iteration 
by approximately a factor of 2.5 on average. From this simulation result, we 
may argue that with 50 particles the gain in statistical efficiency overcomes the 
drawback of increased demand in computation. However this is not guaranteed 
since the computational cost increases with increasing N. 
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2.5011 


1.7931 


2.4847 


0.0408 
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(50.4404) 


(0.0015) 


(0.0064) 


(0.0649) 


(0.0003) 


(0.0027) 



Table 8: Comparing of performance of PMC and PMC^ in the two dimensional 
example. The values in brackets are MSE of measures. 




5 10 15 20 40 



Figure 4: Histogram of the numbers of iterations before the second mode is 
detected by running the PMC with (a) the repulsive effect, and (b) the PMC 
for k = 2.5 during 400 replicates. 

3.4.2 Sensitivity of the repulsive effect 

In order to study the hyperparameters ^ and v in the repulsive effect, we compare 
TT and TT^^ using iV = 50 particles for different values 

^2«(&0 oc iT{e,) ^(1 -v) + u\le-^/-ie.)P'-o.r^ . 

As an alternative to ([6]) the normalizing constant of tt^^ can be approximated 
numerically using the finite grid system over the state space. The ratio of the 
approximate normalizing constants of tt and tt^^ measures the relative size of 
holes created over the target distribution and is an indicator of the impact of 
the repulsive effect. 

Figure [5] presents the contour plots of tt^^ using various values for ^. It is 
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seen that the density becomes smooth with no indication of holes when ^ is 
either very small or very large. For further investigation we estimate the ratio 
of normalizing constants for various sets of hyperparameters and number of par- 
ticles (Figure H]). As ^ increases, wide holes are overlapped and tt^^ « (1 — i')tt 
which has negligible repulsive effect. As N increases the ratio decays to 1 — 
quickly with increasing ^. This suggests the use of a smaller ^ when TV is large 
for a significant repulsive effect. 




-1 -0-5 0.5 1 -1 -0.5 0-5 1 -1 -0.5 05 1 -1 



? = J = 10"** 4 = 10"^ 



Figure 5: The contour plot of tt^^ with ly = 0.3 and $ = 0, IQ-^, 10-^ IQ-^. 



The approximation C" as in equation ([6]) is useful in practice where the nu- 
merical approximation cannot be evaluated. Its accuracy relates to the number 
of samples M with respect to the size of holes f and v. For a simple target like 
our toy example the M can be tuned by adjusting the C to be close to the ratio 
of normalizing constant approximations based on the grid system. This process 
is sensible because the C" can be seen as the ratio of analytically evaluated 
normalizing constants of tt and tt^^. 

In the previous section, the performance study of the PMC^ is undertaken 
based on simulations using ^ = 10"^ and ly = 0.3. With these values, tt^^ is 
very similar to tt and still the repulsive effect is observable. The value of M is 
chosen to be 50 holes and no additional particles are needed to describe the size 
of holes. 



4 Application : Aerosol particle size 

Aerosols are small particles in suspension in the atmosphere and have both 
direct and indirect effects on the earth's climate. Aerosol size distributions de- 
scribe the number of particles observed to have a certain radius, for various size 
ranges, and are studied to understand the aerosol dynamics in environmental 
and health modeling. 
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Figure 6: The ratio of approximate normalizing constant of tt^^ over tt, C" « 
/ Tr^^{e)de/ J Tr{0)d0, for (a) v = 0.1, (b) v = 0.3, (c) v ^ 0.5 using iV = 10 
(solid line), iV = 50 (dashed line) N = 100 (dotted line) and N = 200 (dash-dot 
line) particles. 



The data represented in Figure [7] comprise a full day of measurements, taken 
at ten minutes intervals. The dat a set was collected at a Bo real forest measure- 
ment site at Hyytiala, Finland (|Nilsson and Kulmalal . [2006 ) : a random subsam- 



ple of 2000 observation from the total of 19998 measurements was taken for 
easier implementation of the algorithms. 

The dataset y = {yi, . . . ,2/2000) is described with a mixture of two normal 
distributions 

\N{tiu<jl) + {i-m{m,<yl) 

where < A < 1. The unknown parameters (mi 1 M2 1 o"i 1 0-2 and A) are estimated 
using the missing data approach ( Marin and Robert . 20071 ) with relatively vague 



priors 

^J^^,^J^2-^f[Mean{y),Var{y)) , ai, ^2 - ^(2, 2) , A ^ t/[0, 1] . 

In light of the high dimensionality of the problem we use block updating 
in which an acceptance probabilities based on the ratio of the full conditional 
distributions. In addition, considering smaller dimensional components may 



26 
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Diameter (nm) 

Figure 7: Histogram of the aerosol diameter dataset, along with empirical den- 
sity via the MHA (solid line) and PS (dotted line). 

reduce the sensitivity of the repulsive term in the RP compared to a full dimen- 
sional update and allows the use of the geometrical deterministic proposal. 



The MHA and four hybrid algorithms (the MALA, DRA, DRA with the 
Langevin proposal and PS) were run using 10 particles for 4200 seconds. The 
sets of the parameters ([/.ti, /i2], [fi, 12] and A) were updated in turn. Details of 
implementing the algorithms are as follows. 

MHA and DRA : Proposals are generated using the normal random walk 
with a covariance matrix of 25 x 10~'*/2 for [/ii,/X2], 2.25 x 10~'*/2 for 
[ci, (T2] and a variance of 10~^ for A. These updates are used for DRA^^ 
and PS as the first proposal scheme. 



MALA and DRA with Langevin proposal : At the second stage [/ii,/i2], 
[o'i,o'2] and A are updated using the Langevin proposal with ft,^, hu and 
hp respectively, 

/i^ = 25 X IQ-^h , K = 2.25 X 10-^/2 , hx = lO"" . 

PS : The two dimensional updates [^1,^2] and [(Ti,f72] are implemented ac- 
cording to the algorithm described in Section 2.4 and A is updated as in 
the MHA, without the delayed rejection mechanism and the RP. 



For the RP, values for the tempering factor ^ for /i and a are chosen 
separately using the acceptance rate at the Correction Step as in equation 
d?]). Based on Figure [8] we set = Cct = 10^"*^. Note that these values 
are quite extreme because they are fixed valued parameters, not balanced 
scale parameters. 
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Figure 8: The rate of acceptance at the Correction stage rj in equation ([7]) for 
(a) [fii,^2] and (b) [cri,cr2] updating. 





T 




ESS 


MALA 


46071 


65.6773 


686.2 


DRA^P 


87613 


114.1824 


758.5 


DRA 


125020 


130.7077 


948.8 


PS 


20775 


127.7793 


154.8 


MHA 


203118 


185.4259 


1090.0 



Table 9: Performance of selected hybrid algorithms [MALA, DRA^^, DRA 
and PS) and the MHA 



The results of the MCMC simulations are summarized in TablelHland Figures 
[9] and [TOl Since the two modes are fairly clearly separat ed, label switching was 



not observed for in any of the simulations (IGe wekel , 2007 ) . Overall the estimates 
of the five parameters are similar and the 95% credible interval of parameters 
obtained from all algorithms overlap each other. The denotes the average of 
T^'s of all parameters and ESS = {T - Tq)/?^ (Tq 1000). 

In general, the features of the hybrid algorithms shown from this simulation 
are similar to those observed using the toy example in Sections 3.1 and 3.2. 
The PS is an expensive algorithm to compute and the MHA demands the least 
CPU time per iteration. The Langevin proposal was very effective in reducing 
Tf and the DRA did not improve the mixing as it did in the toy example. This 
is supported by the observation that the two modes are well separated and the 
empirical posteriors of each parameter have a single mode as seen in Figure 
[9l By the property of the Langevin diffusion, particles are pushed back to the 
mode and consequently can explore a single mode more quickly. 
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Figure 9: Estimate of parameters (marked as a cross) with the 95% credible 
interval of (a) MALA, (b) DRA, (c) DRA^^ , (d) PS, and (e) MHA (marked 
as a solid line). 

5 Discussion 

In this paper we have considered the problem of designing hybrid algorithms 
through studying selected hybrid methods, the DRA with three types of move 
for the second-stage proposal, the MALA, the PS, and the PMC. Using a two di- 
mensional example, each algorithm was evaluated and the relative contributions 
of individual components to the overall performance of the hybrid algorithm 
were estimated. The performance was defined by the accuracy of estimation, 
the efficiency of the proposal move, the demand in computation, the rate of 
mixing of the chains and the ability of the algorithm to detect modes in a spe- 
cial setup. Algorithms were also applied a real problem of describing an aerosol 
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Figure 10: Histogram of parameters {^i, ai, X, ^ia2) oi MHA. 



particle size distribution. We observed that the results of this analysis coincided 
with those obtained using a toy example. 

Based on these simulation studies, we subjectively rated the performance 
of the hybrid MCMC algorithms relative to the MHA, and the PMC adapting 
the repulsive effect relative to the PMC. The results are presented in Table [TUl 
Following the three criteria for a efficient algorithm defined in Section 1, we 
considered the efficiency of the proposal move (EPM) , the correlation reduction 
of a chain (CR), the rate of convergence (RC), the cost effectiveness (CE), 
the simplicity of programming (SP), the flexibility of hyperparameters (FH), 
the consistency of performance (CP) and the preference between a single mode 
and a multimodal problem (Mode). As the algorithm improves with respect to 
the criterion, the rating increases positively, with zero indicating that there is 
no substantive difference to either the MHA or the PMC. Among the MCMC 
algorithms it is seen that DRA and DRA^^ are sensible algorithms to use for 
multimodal problems and the MALA for unimodal problems. For the hybrid 
versions via the adaptation of the repulsive effect the overall efficiency based on 
criteria is improved with the PMC relative to the MHA. 

We identified the following overall issues to be considered in designing hybrid 
algorithms. 



(i) Combining features of individual algorithms may lead to complicated char- 
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Algorithm 


Statistical Eff. 


Computation 


Applicability 




EPM 


CR 


RC 


CE 


SP 


FH 


CP 


Mode 


MALA 


1 


1 


1 


-2 


-1 





-1 


Single 


MHA^P 





1 





-2 


-1 


-1 


-2 


Both 


DRA 


1 


1 





-1 


-1 








Both 


DRA^P 


2 


1 





-2 


-1 








Both 


j^j^Pinball 


2 


1 





-2 


-1 








Both 


PS 


2 


1 





-3 


-2 


-1 


-2 


Both 


PMC^ 




2 





-1 


-1 


-1 





Both 



Table 10: The relative performance rating of the hybrid algorithms to the MHA 
and the PMC. 



acteristics in a hybrid algorithm. For instance, it was seen that the per- 
formance of some algorithms was sensitive with respect to the value of 
the tuning parameters such as s, /i, and ^, and this also applied to the 
hybrid algorithms. For the MHA with the repulsive effect the improve- 
ments from the statistical perspective were sensitive to both the size of 
the normal random walk (a well known property of the MHA) and the 
tempering factor. Moreover, some inconsistency of estimates throughout 
replicates occurred, albeit rarely. 

(ii) Each individual algorithm may have a strong individual advantage with 

respect to a particular performance criterion, but this does not guaran- 
tee that the hybrid method will enjoy a joint benefit of these strategies. 
The contribution of some components may be insignificant, and certain 
components may dominate the character of the hybrid method. This can 
be seen with the PS as the DRA most strongly contributes to statistical 
efficiency. 

(iii) From the perspectives of applicability and implementation, the combina- 
tion of algorithms may add complexity in setup, programming and com- 
putational expense. These phenomena are easily observed as all hybrid 
approaches considered in this paper were computationally demanding, al- 
though the magnitude of the demand differed with the types of move and 
techniques. In practice it is important to be aware of these issues to opti- 
mize the performance in real time so that the improvement in one criterion 
does not become negligible due to the drawback in computation. This was 
observed, for example, with the DRA'"^ . 

These general considerations in building an efficient algorithm are easily applied 
to existing hybrid methods. The adaptive MCMC algorithm is another good 
example. Its motivation is to automate and improve the tuning of the algo- 
rithm by learning from the history of the chain itself. It can be easily coded and 
is stati stically efficient withou t a str enuo us increase in the computational ex- 
pense. iRoberts and Rosenthall ( 20071 ) and Rosenthal ( 2008 ) demonstrated that 
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an algorithm can learn and approach an optimal algorithm via automatic tun- 
ing scaling parameters to optimal values. After a sufficiently long adaptation 
period it will converge much faster than a non-adapted algorithm. However 
this fast converge nce feature may increase the risk that chains may converge to 
the wrong values ( Robert and Casellal [20o3 iRosenthall . l2008t ). Moreover, smce 
proposals are accepted using the history of the chain the sampler is no longer 
Markovian and standard convergence techniques cannot be used. The adaptive 
MCM C algorithms conv e rge only if the adaptat i ons a re done at regeneration 



times ( Gilks et al. . 1998t Brockwell and Kadane, 2005) or under certain tech 



nical types of the adaptation procedure (IHaario et al 



20011: Andrieu and E 



2006HAtchade and Rosentha]ll2005l : [Roberts and RosenthallbOOTi ). Overall the 



adaptive MCMC algorithm can be efficient from the statistical perspective and 
cost effective only when it is handled with care. In other words, these advantages 
come with a reduction in robust reliability which may affect its applicability. 



In this paper, hybrid algorithms comprising components running in parallel 
were considered. It would also be of interest to consider and compare in an 
analogous fashion hybrid algorithms comprising components that are run in 
series. Without compromising the stationarity of the chain, a set of components 
similar to those described here could be implemented either in turn or randomly, 
with or without cues from the existing chain or estimated distribution. The 
resultant algorithms would have diverse characteristics related to the selected 
components as well as the sequence in which they were run. As considered here, 
parallels with current adaptive MCMC algorithms could also be drawn. 
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